flickr image
Cross-Modal Learning of Housing Quality in Amsterdam
Levering, Alex, Marcos, Diego, Tuia, Devis
In our research we test data and models for the recognition of housing quality in the city of Amsterdam from ground-level and aerial imagery. For ground-level images we compare Google StreetView (GSV) to Flickr images. Our results show that GSV predicts the most accurate building quality scores, approximately 30% better than using only aerial images. However, we find that through careful filtering and by using the right pre-trained model, Flickr image features combined with aerial image features are able to halve the performance gap to GSV features from 30% to 15%. Our results indicate that there are viable alternatives to GSV for liveability factor prediction, which is encouraging as GSV images are more difficult to acquire and not always available.
Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
Sun, Yao, Kruspe, Anna, Meng, Liqiu, Tian, Yifan, Hoffmann, Eike J, Auer, Stefan, Zhu, Xiao Xiang
Crowdsourced platforms provide huge amounts of street-view images that contain valuable building information. This work addresses the challenges in applying Scene Text Recognition (STR) in crowdsourced street-view images for building attribute mapping. We use Flickr images, particularly examining texts on building facades. A Berlin Flickr dataset is created, and pre-trained STR models are used for text detection and recognition. Manual checking on a subset of STR-recognized images demonstrates high accuracy. We examined the correlation between STR results and building functions, and analysed instances where texts were recognized on residential buildings but not on commercial ones. Further investigation revealed significant challenges associated with this task, including small text regions in street-view images, the absence of ground truth labels, and mismatches in buildings in Flickr images and building footprints in OpenStreetMap (OSM). To develop city-wide mapping beyond urban hotspot locations, we suggest differentiating the scenarios where STR proves effective while developing appropriate algorithms or bringing in additional data for handling other cases. Furthermore, interdisciplinary collaboration should be undertaken to understand the motivation behind building photography and labeling. The STR-on-Flickr results are publicly available at https://github.com/ya0-sun/STR-Berlin.
More-flexible machine learning
Machine learning, which is the basis for most commercial artificial-intelligence systems, is intrinsically probabilistic. An object-recognition algorithm asked to classify a particular image, for instance, might conclude that it has a 60 percent chance of depicting a dog, but a 30 percent chance of depicting a cat. At the Annual Conference on Neural Information Processing Systems in December, MIT researchers will present a new way of doing machine learning that enables semantically related concepts to reinforce each other. So, for instance, an object-recognition algorithm would learn to weigh the co-occurrence of the classifications "dog" and "Chihuahua" more heavily than it would the co-occurrence of "dog" and "cat." In experiments, the researchers found that a machine-learning algorithm that used their training strategy did a better job of predicting the tags that human users applied to images on the Flickr website than it did when it used a conventional training strategy.
Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks
You, Quanzeng (University of Rochester) | Luo, Jiebo (University of Rochester) | Jin, Hailin (Adobe Research) | Yang, Jianchao (Adobe Research)
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis. Motivated by the needs in leveraging large scale yet noisy training data to solve the extremely challenging problem of image sentiment analysis, we employ Convolutional Neural Networks (CNN). We first design a suitable CNN architecture for image sentiment analysis. We obtain half a million training samples by using a baseline sentiment algorithm to label Flickr images. To make use of such noisy machine labeled data, we employ a progressive strategy to fine-tune the deep network. Furthermore, we improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. We have conducted extensive experiments on manually labeled Twitter images. The results show that the proposed CNN can achieve better performance in image sentiment analysis than competing algorithms.